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STATISTICAL SUMMARY AND TREND EVALUATION OF AIR QUALITY 
DATA FOR CLEVELAND, OHIO, IN 1967 TO 1971- 
TOTAL SUSPENDED PARTICULATE, NITROGEN DIOXIDE, 

AND SULFUR DIOXIDE 

by Harold E. Neustadter, Steven M. Sidik, and John C. Burr, Jr.* 

Lewis Research Center 

SUMMARY 

Air -quality data (total suspended particulate, nitrogen dioxide, and sulfur dioxide) 
for Cleveland, Ohio, for the period of 1967 to 1971 have been collated and subjected to 
statistical analysis. Total suspended particulate is clearly lognormally distributed, 
while sulfur dioxide and nitrogen dioxide are reasonably approximated by lognormal 
distributions. The air-quality standards for the State of Ohio are met only sporadically 
by sulfur dioxide in isolated residential neighborhoods. Nowhere in Cleveland are the 
standards for total suspended particulate or nitrogen dioxide met. Definite improvement 
in air quality has taken place in the industrial valley, while in the rest of the city, only 
sulfur dioxide has shown consistent improvement. 

A pollution index has been introduced which directly displays information regarding 
the degree to which the environmental air conforms to the mandated standards. As such, 
it is a useful tool in air-quality monitoring programs. 


INTRODUCTION 

This report presents the results of various statistical analyses of data obtained by 
the Air Pollution Control Division (APCD) of Cleveland, Ohio. It contains a tabulation 
of averages, statistics relevant to lognormal distributions, and goodness-of-fit statis- 
tics. In addition, a pollution-level index is introduced which relates the measured pol- 
lution levels over a year to the existing air -quality standards. 


*Air Pollution Control Division, Cleveland, Ohio. 



The air -sampling program of APCD is currently in its sixth year. Twenty-four- 
hour samplings have been made of total suspended particulate (TSP) since January 1967, 
and of nitrogen dioxide (NOg) and sulfur dioxide (SOg) since January 1968. The sam- 
pling methods used are high-volume air sampling, Jacobs-Hocheiser, and West-Gaeke 
sulfuric acid, respectively. The geographic deployment of sampling sites is shown in 
figure 1. The meandering heavy line in the center of the- city is the Cuyahoga River, 
about which is centered most of the region's heavy industry. 

At present, there are 21 stations monitoring the air. Fifteen of these stations mon- 
itor all three pollutants, while the remaining six (stations O to T in fig. 1) measure 
TSP only. Seventeen of these sites have been in operation for more than 5 years. Sta- 
tions B, D, K, and N have undergone relocation since their initial installation. How- 
ever, because of the proximity of their present sites to their former sites, we have 
assumed that essentially the same environment has been measured throughout the 5-year 
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Air Pollution Control Office, 2785 Broadway 
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Brooklyn Y. M. C. A. , West 25 St. and Denison 
Cleveland Health Museum, 8911 Euclid 
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Collinwood High School, East 152 and St. Clair 
Cudell Recreation Center, West Boulevard and Detroit 
Estabrook Recreation Center, Fulton and Memphis 
Fire Station 13 , 4749 Broadway 
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G. Washington Elementary School, 16210 Lorain 
Harvard Yards, 4150 East 49 St. 

J. F. Kennedy High School, 17100 Harvard 
P. L. Dunbar Elementary School, 2200 West 28 St. 
Almira Elementary School, West 98 St. and Almira 
Fire Station 29, East 105 St. and Superior 
John Adams High School, 3817 East 116 St. 

J. F. Rhodes High School, 5100 Biddulph 
St. Joseph High School, 18491 Lake Shore Blvd. 
Supplementary Education Center, 1365 E. 12 St. 

St. Vincent Charity Hospital, E. 22 St. 
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Figure 1. - Air pollution monitoring sites for Cleveland, Ohio. The heavy line down the center is the Cuyahoga River. The municipal 
boundaries have been straightened somewhat but are accurate in their essential features. 
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period. Currently, the air is sampled every third day, although the sampling frequency 
has varied over the 5 years and has been as low as once a week. Some of these data 
have been presented elsewhere in a more preliminary manner (ref. 1). The data anal- 
ysis reported herein was performed by the Environmental Research Office of the NASA 
Lewis Research Center (LeRC) as part of the preliminary phase of a joint APCD-LeRC 
program to study trace elements and compounds in airborne particulate matter. 


CLEVELAND AEROMETRIC DATA 

Pertinent results are presented in tables I, II, and m for TSP, N0 2 , and SC> 2 , re- 
spectively. In each table, the first column gives an alphabetic designation of the mon- 
itoring site corresponding to the code shown in figure 1. The second column lists the 
various parameters of interest for each of the pollutants. These parameters are 
(1) number of days observed (readings); (2) geometric (TSP) or arithmetic (SC> 2 and 
N0 2 ) averages; (3) standard geometric deviation; (4) estimated value of the second 
largest pollution level for the year; and (5) an adjusted Kolmogorov -Smirnov goodness- 
of-fit statistic for lognormality, denoted as -^N D. 

Air -quality standards are set nationally by the Environmental Protection Agency 
(EPA) of the Federal Government (ref. 2) and statewide by the Air Pollution Control 
Board of the Department of Health (DoH) of the State of Ohio (ref. 3). Whenever these 
two standards differ, we have chosen to work with the DoH (more stringent) standard, 
which is listed in the third column. In the remaining five columns are the various sta- 
tistics for each of the years 1967 to 1971. 


Number of Readings 

For each pollutant, both EPA and DoH require a minimum of one sampling every 
sixth day, or an equivalent set of at least 61 random samples per year. Thus, we des- 
ignate this standard as >60 in the tables. Even though early in the program some sta- 
tions did not achieve 60 samples per year for each pollutant, we have included the anal- 
yses of these data sets in this report. At present, the nominal schedule of APCD calls 
for monitoring the environmental air every third day. In practice, this procedure gen- 
erally allows stiff icient margin for unanticipated disruptions (e. g. , equipment failure) 
while still exceeding 60 readings per year. 



TABLE I. - TOTAL SUSPENDED PARTICULATE DATA SUMMARY FOR 1967 TO 1971 



Number of readings 
Geometric average 
Standard geometric deviation 
Second highest reading 
Goodness-of-fit statistic. Vn D 


the calculation used to obtain this estimate assumed lognormality despite \/n D > 0. 736. 
^Sampling site was relocated within same general neighborhood in midyear. It is assumed 
that for sampling purposes the environmental air was the same at both locations, 
temporarily discontinued because of construction at sampling site. 
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TABLE I. - Concluded. TOTAL SUSPENDED PARTICULATE DATA SUMMARY FOR 1967 TO 1971 


Monitoring 
station (see 
fig. i) 

Statistic 

Standard 

1967 

1968 

1969 

1970 

1971 

L 

Number of readings 

>60 

■ 

9 

■ 

37 

73 


Geometric average 

60 


m 


170 

212 


Standard geometric deviation 




■ 

Ha 

1.6 


Second highest reading 

150 

BP 

■ 

■ 


637 


Goodness -of -fit statistic, D 



■ 

■ 

on 

0. 64 

M 

Number of readings 

mm 

60 

72 

74 

89 

72 


Geometric average 

HII: 

86 

82 

75 

86 

82 


Standard geometric deviation 


1. 5 

1. 6 

1. 5 

1. 6 

■O 


Second highest reading 

150 

266 

281 

222 

294 



Goodness -of -fit statistic, D 


0. 48 

0.64 

0.60 

0.62 

0. 59 

N 

Number of readings 

>60 

48 


73 

b 75 

11 


Geometric average 

60 

129 


142 

134 

El 


Standard geometric deviation 


1. 8 

1. 8 

1.9 

2. 4 

2. 0 


Second highest reading 

150 

592 

784 

747 

a 1 273 

905 


Goodness -of -fit statistic, ^/N D 

1 

0. 60 

0. 57 

0. 67 

0. 99 

0. 71 

0 

Number of readings 

mm 

69 

75 

72 

90 

91 


Geometric average 

■h 

92 

86 

79 

89 

89 


Standard geometric deviation 


1. 5 

1. 6 

1.6 

1. 7 

1. 8 


Second highest reading 

150 

265 

298 

a 270 

333 

422 


Goodness-of-fit statistic, ^N D 


0.62 

0. 39 

0. 83 

0. 71 

0. 55 

P 

Number of readings 

■9 

62 

74 

72 

93 

74 


Geometric average 

■9 

135 

139 

127 

137 

146 


Standard geometric deviation 


1. 4 

1. 5 

1.6 

1.5 

1. 4 


Second highest reading 

150 

343 

390 

407 

412 

371 


Goodness-of-fit statistic, ^N D 


0.71 

0. 40 

0.64 

0. 55 

0. 60 

Q 

Number of readings 

1EI 


69 

70 

88 

79 


Geometric average 

189 


95 

96 

106 

101 


Standard geometric deviation 


1. 5 

1. 5 

1.4 

1.8 

1. 4 


Second highest reading 

150 

310 

277 

241 

a 495 

256 


Goodness-of-fit statistic, ^N D 


0.62 

0. 42 

0. 67 

0. 97 

0.65 

R 

Number of readings 

mm 

57 

72 

65 

90 

66 


Geometric average 

mm 

81 

80 

81 

89 

89 


Standard geometric deviation 


HI 

1. 7 

1.6 

1.6 

1.7 


Second highest reading 

150 


304 

285 

309 

384 


Goodness-of-fit statistic, ^N D 


0. 44 

0. 69 

0. 52 

0. 49 

0.60 

S 

Number of readings 

>60 

m 

■ 

s 


51 


Geometric average 

60 



■ 


92 


Standard geometric deviation 






1. 5 


Second highest reading 

- 150 



■ 


290 


Goodness-of-fit statistic, D 


■ 

■ 

■ 

1 

0. 71 

T 

Number of readings 

>60 

| 

91 

■ 


41 


Geometric average 

60 





170 


Standard geometric deviation 



■ 

■ 

■ 

2.0 


Second highest reading 

150 


■ 


■ 

1014 


Goodness-of-fit statistic, ^/n D 



■ 

■ 


0. 48 

U 

Number of readings 

MM 

■ 

■ 

■ 

■ 

d 26 


Geometric average 

mm 





162 


Standard geometric deviation 



■ 

■ 

1 1 

1. 5 


Second highest reading 

150 




1 



Goodness-of-fit statistic. D 


■ 

■ 

■ 

■ 



a The calculation used to obtain this estimate assumed lognormality despite \/n D > 0. 736. 
^Sampling site was relocated within same general neighborhood in midyear. It is assumed 


that for sampling purposes the environmental air was the same at both locations, 
temporarily discontinued because of construction at sampling site. 

^Sampling was initiated in the latter part of the year. 
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TABLE II. - NITROGEN DIOXIDE DATA SUMMARY FOR 1968 TO 1971 


Monitoring 
station (see 
Ii K . 1) 

Statistic 

Standard 

1968 

1969 

1970 

1971 

— 

Monitoring 
station (see 
f>6- 1) 

Statistic 

Standard 

1968 

1969 

1970 

1971 

A 

Number of readings 


71 

73 

84 

86 

I 

Number of readings 

>60 

67 

76 

in 

88 


Geometric average 


211 

220 

214 

202 


Geometric average 


247 

253 

238 

217 


Standard geometric deviation 


1. 4 

1.4 

1. 4 

1. 5 


Standard geometric deviation 


1.4 

1. 3 

1.3 

1. 5 


Second highest reading 


517 

470 

464 

538 


Second highest reading 


535 

495 

a 495 

a 615 


Goodness-of-fit statistic, \^N D 



HB 




Goodness-of-fit statistic, 04 D 


0. 45 

0. 71 

1. 1 

0. 93 

B 

Number of readings 

>60 


■ 

■ 

81 

J 

Number of readings 

>60 


52 

113 

93 


Geometric average 

100 


■ 


190 


Geometric average 



225 

255 

240 


Standard geometric deviation 




11 

1. 5 


Standard geometric deviation 



1. 4 

1. 4 

1.5 


Second highest reading 



l 


a 539 


Second highest reading 



488 

a 548 

600 


Goodness-of-fit statistic, 04 D 


BR 

i 

■ 



Goodness-of-fit statistic, 01 D 



0.65 

0. 82 

0. 58 

c 

Number of readings 


76 

75 

115 

96 

K 

Number of readings 



74 


88 


Geometric average 


177 

248 

234 

255 


Geometric average 



192 


183 


Standard geometric deviation 


1. 5 

1.3 

1. 4 

1. 6 


Standard geometric deviation 



1. 4 

1. 4 

1. 6 


Second highest reading 


a 495 

a 454 

a 576 

835 


Second highest reading 



417 

a 486 

565 


Goodness-of-fit statistic, 04 D 


0. 87 



0.64 


Goodness-of-fit statistic, 0 D 






D 

Number of readings 

>60 

55 


b 83 

c 47 

L 

Number of readings 

>60 



41 

80 


Geometric average 



219 

217 

199 


Geometric average 




220 

219 


Standard geometric deviation 


2.0 

1. 3 

1. 5 

1. 4 


Standard geometric deviation 




1.4 

i. 5 


Second highest reading 


a 1056 

424 

a 576 

465 


Second highest reading 




513 

572 


Goodness-of-fit statistic, 04 D 


1.65 

0.70 

1.03 

KSH 


Goodness-of-fit statistic, 04 D 





0.71 

E 

Number of readings 

>60 

69 

74 

Iffl 

96 


Number of readings 

>60 

55 

74 

96 

73 


Geometric average 


■v.'' 

237 

Km 

Em 


Geometric average 


157 

168 

176 

159 


Standard geometric deviation 


1.4 

1. 3 

1. 4 

1. 6 


Standard geometric deviation 


1. 4 

l. 3 

1. 3 

1.6 


Second highest reading 


497 

a 437 

a 504 

a 686 


Second highest reading 


a 342 

335 

341 



Goodness-of-fit statistic, 04 D 


0.70 

m 

1. 39 

1. 69 


Goodness-of-fit statistic, 04 D 


m 

m 

0.65 


F 

Number of readings 

>60 

47 

74 

96 

86 


Number of readings 




39 

88 


Geometric average 

100 

212 

197 

215 

FT> • 


Geometric average 


■ 

■ 

208 

223 


Standard geometric deviation 


1. 4 

1. 3 

1. 3 

1. 5 


Standard geometric deviation 


1 


1.6 

1. 6 


Second highest reading 


J 5U 

a 370 

444 

a 518 


Second highest reading 


■ 

■ 

647 

a 7 12 


Goodness-of-fit statistic, 04 D 


0. 78 

0. 76 

0. 70 

0.93 


Goodness-of-fit statistic, 04 D 



1 

0. 65 

0. 95 

G 

Number of readings 

>60 

72 

72 

104 

89 

U 

Number of readings 

>60 




d 36 


Geometric average 


■a 

221 

224 



Geometric average 





230 


Standard geometric deviation 


1.5 

1. 3 

1. 3 

1. 5 


Standard geometric deviation 





1.9 


Second highest reading 


571 

a 432 

453 

516 


Second highest reading 





a 1030 


Goodness-of-fit statistic, 04 D 



0.91 

0. 43 

0.65 


Goodness-of-fit statistic, 04 D 





1.34 

H 

Number of readings 



71 

114 

78 




■ 

■ 

■ 

H; 


Geometric average 



225 

213 

202 



■SSH 

gg 


■ 



Standard geometric deviation 


Is 

1. 3 

1. 4 

m 









Second highest reading 


a 471 

a 443 





■ ■ 


1 

1 

■ 


Goodness-of-fit statistic. 04 D 


1.03 

0. 75 


IS 



m 

■ 





a Thc calculation used to obtain this estimate assumed lognormality despite 0i D ^ 0.736. 

^Sampling site was relocated within same general neighborhood in midyear. It is assumed that for sampling purposes the environmental air was the same at both 


locations. 

temporarily discontinued because of construction at sampling site. 
“Sampling was initiated in the latter part of the year. 
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TABLE III. - SULFUR DIOXIDE DATA SUMMARY FOR 1968 TO 1971 


Monitoring 
station (see 
fig. i) 

Statistic 

Standard 

1968 

1969 

1970 

1971 

Monitoring 
station (see 
fig. i) 

Statistic 

Standard 

1968 

1969 

1970 

1971 

A 

Number of readings 

>60 

71 

74 

82 

88 

I 

Number of readings 

>60 

64 

77 

108 

83 


Geometric average 

60 

137 

J35 

116 

84 


Geometric average 

60 

129 

110 

101 

67 


Standard geometric deviation 


2. 4 

2. 0 

1. 9 

2. 2 


Standard geometric deviation 


1. 8 

1.8 

1. 9 

2. 1 


Second highest reading 

260 

a 972 

a 674 

a 5l8 

523 


Second highest reading 

260 

a 522 

467 

a 449 

a 358 


Goodness-of-fit statistic, ^/n D 


0.75 

0. 96 

0. 88 

0.66 


Goodness-of-fit statistic, \^N D 


1. 04 


0. 87 

0. 90 

B 

Number of readings 

>60 



9 

86 

J 

Number of readings 

>60 

■ 

52 

113 

93 


Geometric average 

60 




50 


Geometric average 

60 

■ 

113 

124 

79 


Standard geometric deviation 





2. 1. 


Standard geometric deviation 


■ 

1.9 

IQ 

2. 0 


Second highest reading 

260 




284 


Second highest reading 

260 

■ 

543 


a 410 


Goodness-of-fit statistic, ^N D 





0. 70 


Goodness-of-fit statistic, -04 D 


■ 

0. 53 

Hi 

1. 23 

C 

Number of readings 

>60 

72 

76 

105 

93 

K 

Number of readings 

>60 

74 

75 


81 


Geometric average 

60 

95 

85 

74 

67 


Geometric average 

60 

53 

58 

59 

49 


Standard geometric deviation 


2. 4 

2. 3 

2. 3 

2. 4 


Standard geometric deviation 


2.5 

2. 1 

1.9 

2. 4 


Second highest reading 

260 

644 

546 

476 

485 


Second highest reading 

260 

399 

320 

258 

a 359 


Goodness-of-fit statistic, \/n D 


0. 61 

0. 48 

0.54 

0. 73 


Goodness-of-fit statistic, ^N D 


0. 55 

0. 57 

0. 64 

0.83 

D 

Number of readings 

>60 

53 

72 

b 79 

c 45 

L 

Number of readings 

>60 



42 

79 


Geometric average 

60 

106 

■m 


89 


Geometric average 




157 

116 


Standard geometric deviation 


1.8 

1. 7 

2.0 

2.0 


Standard geometric deviation 




1. 7 

2.6 


Second highest reading 

260 

413 

278 

a 538 

a 469 


Second highest reading 

260 



569 

a !0!3 


Goodness-of-fit statistic, ^N D 


0. 52 

0.47 

0.91 

0. 76 


Goodness-of-fit statistic, ^N D 




0. 62 


E 

Number of readings 

>60 

71 

75 

m 

94 

M 

Number of readings 

>60 

53 

73 

98 

58 


Geometric average 

60 

1 12 

10 7 

96 

65 


Geometric average 

60 

50 

55 

58 

41 


Standard geometric deviation 


1.9 

1. 6 

1.8 

2. 1 


Standard geometric deviation 


1.9 

1.9 

2. 3 

2.6 


Second highest reading 

260 

476 

314 

a 397 

375 


Second highest reading 

260 


235 

309 

a 372 


Goodness-of-fit statistic, ^N D 


0.68 

0. 42 

0.88 



Goodness-of-fit statistic, ^N D 



|Q 

0.67 

0. 74 

F 

Number of readings 

>60 

47 

75 

97 

86 

N 

Number of readings 

>60 



35 

81 


Geometric average 

60 

84 

76 

90 

59 


Geometric average 

60 



68 

72 


Standard geometric deviation 


1.9 

2. 1 

1. 8 

2.3 


Standard geometric deviation 




2.6 



Second highest reading 

260 

a 364 

a 409 

373 



Second highest reading 




a 548 



Goodness-of-fit statistic, ^N D 


0.80 

1.04 

0.68 



Goodness-of-fit statistic, \/n D 




0. 76 


G 

Number of readings 

>60 

69 

71 

105 

86 

U 

Number of readings 



■ 

n 

d 34 


Geometric average 

60 

77 

58 

63 

50 


Geometric average 



■ 

■ 

114 


Standard geometric deviation 


2. 1 

2.0 

1. 9 

2. 4 


Standard geometric deviation 





2. 3 


Second highest reading 

260 

414 

294 

Xj 

jgg 


Second highest reading 





137 


Goodness-of-fit statistic, ^/n D 


0. 57 

0. 70 




Goodness-of-fit statistic, yN D 





0. 55 

H 

Number of readings 

>60 

62 

71 


72 





| 

1 

p 


Geometric average 

60 

64 

63 

66 

48 





■ 

n 



Standard geometric deviation 


2. 3 

2. 3 

2. 2 

2. 4 




hi 

m 




Second highest reading 

260 

a 4 16 

mm 

MM 

336 




■ si 

ift 




Goodness-of-fit statistic. Vn D 


0. 85 

0. 69 

0. 47 

0. 72 









a The calculation used to obtain this estimate assumed lognormality despite D > 0. 736. 

^Sampling site was relocated within same general neighborhood in midyear. It is assumed that for sampling purposes the environmental air was the same at both 


locations. 

temporarily discontinued because of construction at sampling site. 
^Sampling was initiated in the latter part of the year. 
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Geometric and Arithmetic Averages 


The geometric average is used in table I, and the arithmetic average is used in ta- 
bles II and m. This corresponds to the particular averaging method stipulated by EPA 
and DoH standards. Calculations were performed whenever the number of readings ex- 
ceeded 10. The values listed as standards are the DoH primary standards, which cor- 
respond to the EPA secondary standards. 


Standard Geometric Deviation (SGD) 

It has been noted that, irrespective of sampling duration or location, air-sampling 
data are generally distributed lognormally (ref. 4). When such is actually the case, the 
entire data set is sufficiently described by its geometric average and SGD. The higher 
the SGD, the greater the spread between the lower and higher values. As with the aver- 
ages, SGD was calculated for data sets of more than 10 readings. 


Second Largest Value 

Both EPA and DoH standards for TSP and S0 2 specify that a certain level of pollu- 
tion is ", . . not to be exceeded more than one time per year. " This implies that for 
the 365 daily pollution levels per year (366 for leap years), there is no upper bound on 
the largest single level. However, the next largest value (i. e. , the second most pol- 
luted day of the year) is required to be at or below the standard. Thus, tables I, II, 
and III include estimates of the second highest pollution level for each year. As with the 
averages, the values listed here are the DoH primary standards, which correspond to 
EPA secondary standards. While NOg has only a standard for the annual average, we 
believe the estimated second largest level for a year is useful information and we have 
included it in table II. 

An approximation to the second largest pollution level estimate, for a year of n 
days and a sample of N observations, is obtained by the following procedure. (The 
transformation to the logarithms of the data values is made because the expected values 
of normal order statistics are well developed in the literature, whereas we are not 
aware of any comparable development for lognormal distributions. ) The logarithms 
y^ = ln(xp of the pollution levels x^ are computed. According to the assumption of 
lognormality, these y. values follow a normal distribution. The sample mean y and 
sample standard deviation s^ of the set of logarithms are computed. From reference 4, 
the expected value of the second largest observation in a sample of 365 (366 in a leap 
year) independent values from a normal distribution is 2. 63 (to three significant digits) 
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standard deviations from the mean. This value, along with the average y and the 
standard deviation s^ of the set of logarithms, is used in the following equation to ob- 
tain the estimate of the second largest pollution level of the year: 

^2nd = y + 2. 6 3 s y (1) 

The values of Xg^ l istec * in tables I, II, and IE are obtained by exponentiation, as 

x 2nd = ex P^2nd ) ^ 

Because of the decreased precision which occurs when extrapolating to the tail of a 
distribution and because the sample mean and standard deviation are used, the minimum 
number of readings for this calculation was increased to 30 as opposed to 10 used for 
the averages. Implicit in using equation (1) is the assumption of lognormality of the 
data, which leads us to the final entry in these tables. 


Kolmogorov-Smirnov Statistic 

The Kolmogorov-Smirnov statistic is a goodness -of -fit statistic which can be ap- 
plied to any distribution (ref. 5). In testing for a lognormal distribution, it is easier 
for calculation purposes to take the logarithms of the values and test for goodness -of -fit 
to a normal distribution. This statistic was originally intended for use when the distri- 
bution which the data is suspected of following is completely specified. For the normal 
distribution, this is equivalent to knowing the mean ju and the standard deviation a. In 
this case, the Kolmogorov-Smirnov statistic is denoted D and is calculated as 


D = 


i 




(3) 


where the function <f>(z) denotes the cumulative standard normal distribution function. 

The statistic D measures the maximum deviation of the observed cumulative dis- 
tribution function from the theoretical cumulative distribution function. Thus, D is 
always a value between zero and one. A value of zero would indicate a perfect fit of the 
sampled data to a lognormal distribution, and larger values indicate an increasing de- 
viation from lognormality. 

When the mean and the standard deviation are unknown, it is common to use the 

- 2 / 1/2 

estimates y and s y = [S^yj - y) / (N - 1)] in place of ju and a. Lilliefors has 
studied the use of the Kolmogorov-Smirnov statistic in this situation (ref. 6). Table IV 


9 



TABLE IV. - SIGNIFICANCE LEVELS FOR THE 


KOLMOGOROV-SMIRNOV GOODNESS- 
OF-FIT STATISTIC 
[From ref. 6.] 


Signifi- 
cance level, 
a 

0.20 

0. 15 

0. 10 

0.05 

0.01 

Statistic, 

Jn d 

v a 

0.736 

0. 768 

0. 805 

0. 886 

1.031 


of this report presents the significance levels of -^N D from reference 6 for samples 
of N > 30. Thus, the statistics in tables I, n, and in are presented as \/n D. 

It should be recognized that the observed pollution levels are but a sample of levels 
from some distribution. Thus, even if the distribution of the complete set of pollution 
levels is indeed lognormal, some of the samples will lead to large values of \/N D. 

The interpretation of the tabulated significance levels a is that if the distribution is 
indeed lognormal, then about 100 a; percent of the samples tested will lead to a value of 
-^N D which exceeds (^N d) , whereas about 100(1 - a) percent will lead to a value of 
Vn D lower than (^/n d) a . Because subsequent calculations in this report depend 
heavily on the assumption of lognormality, the value of a = 0. 20 was chosen. Choosing 
this large value for a has the drawback of rejecting the assumption of lognormality a 
substantial proportion of the times that the distribution is lognormal. However, it has 
the compensating advantage of being more discriminating against distributions which are 
not lognormal. 


LOGNORMALITY 
Lognormal Plots 

As a graphical means of assessing the goodness-of-fit of the data to a lognormal 
distribution, we can enter the observed data on lognormal probability graphs. Figures 
2 and 3 show two such plots for TSP. The solid line indicates the plot of the cumulative 
sample distribution of all measurements over the 5-year period. The data points pre- 
sent the separate sample distributions for the 5 years (1967 to 1971). Any steady in- 
crease or decrease in the pollutant concentrations would be discernible as a vertical 
sequence of the data points representing those years. In the two cases shown, there is 
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no overall trend. Figure 2 is for station I in the industrial valley. The overprinting 
of the data points shows the TSP levels to be fairly uniform at a rather high average 
level for the 5-year period. Figure 3 represents station K, in a residential neighbor- 
hood, predominantly upwind from the industrial region. 

A full set of lognormal curves for all 21 stations for the three pollutants is available 
on microfiche from the authors upon request. 


Goodness of Fit 

To indicate the decreasing likelihood of lognormality as \/n D increases, all values 
calculated on the assumption of lognormality for which the goodness -of- fit statistic is 
outside the 20-percent confidence level (i. e. , the data having D > 0. 736) are foot- 
noted in the tables. For a further indication of lognormality, as well as for a check on 
the consistency of our data, we examined the distribution of sets for which Vn D > 

0.736. 

Table V summarizes the results of the goodness-of-fit tests in which the a = 0. 20 
significance level was used. The first column lists the station identification. The re- 
maining columns list for each of the pollutants the number of yearly tests which were 
performed and the number of these tests which rejected the assumption of lognormality. 
For TSP, there are 85 tests, of which 20 were rejections. This is very close to the ex- 
pected number of rejections and implies that the distribution of TSP may very safely be 
considered to be lognormal. For N0 2 and SC> 2 , however, there are more than twice as 
many rejections as would be expected, and hence their closeness to a lognormal distri- 
bution is somewhat suspect. On the basis of an examination of the lognormal plots of 
S0 2 and the fact that the SO2 departure from lognormality, as indicated by -^N D, is not 
severe, we will proceed on the assumption that the lognormal is still a useful approxi- 
mation to the distribution of SOg. 

Further examination of table V shows that the lognormality of TSP, S0 2 and N0 2 is 
most questionable at stations E, F, and I. Benarie (ref. 7) and Mitchell (ref. 8) have 
each considered the additivity of lognormal distributions. Mitchell has shown that under 
certain conditions the sum of independent and identically distributed lognormal variates 
also follows a lognormal distribution. Benarie has considered a more general situation, 
where the lognormal variates have differing geometric means and standard geometric 
deviations. His conclusions are that when a large number (>10) of lognormal variates 
with slightly differing geometric means are superimposed, the resulting distribution is 
still well approximated by a lognormal distribution. However, when a small number 
(<10) of lognormal variates with differing means are superimposed, the resulting dis- 
tribution generally is not a lognormal. Thus, it is possible to conjecture that pollution 
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TABLE V. - SUMMARY OF RESULTS OF GOODNESS-OF-FIT TESTS 


Monitoring 

Total suspended 

Nitrogen dioxide 

Sulfur dioxide 

station (see 

particulate 

Number 
of tests 

Rejected 

Number 
of tests 

Rejected 

fig. 1) 

Number 

Rejected 


of tests 







4 

2 

4 


4 

3 


5 

0 

1 


1 

0 


5 

1 

4 


4 

0 

D 

4 

0 

4 


4 

mm 

E 

5 

3 

4 


4 


F 

5 

2 

4 


4 


G 

4 

1 

4 

1 

4 

wm 

H 

4 

0 

4 


4 


I 

5 

3 

4 


4 

•3 

J 

5 

3 

3 


3 

9 ' 

K 

5 

2 

4 

■ 

4 


L 

2 

0 

2 

0 

2 

M 

M 

5 

0 

4 

1 

4 


N 

5 

1 

2 

1 

2 

2 

0 

5 

1 





P 

5 

0 





Q 

5 

1 





R 

5 

0 





S 

1 

0 





T 

1 

0 





U 



1 

1 

1 

0 

Total 

85 

20 

49 

23 

49 

20 

Percentage 
of tests re- 
jected 


24 


47 


41 

Expected 
number of 
rejections 


17 


9. 8 


9. 8 


levels at stations E, F, and I are dominated by a small number of major sources, 
whereas the remaining stations reflect the influence of either a single large source or a 
superposition of many sources. 


AIR QUALITY 

Among the goals of APCD are monitoring of the environmental air, determination 
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of its quality, and initiation of action to improve the local air quality, where indicated. 
There are well established techniques for analysing lognormal plots to extract informa- 
tion pertinent to determining compliance with air-quality standards and/or the existence 
of long-term trends (ref. 9). However, it is often desirable to have available some sin- 
gle number, or index, which presents as simply as possible a maximum of information. 
To this end we have developed an index, which we call Polludex, which gages the con- 
formity of the measured environment to the established standards. 


Polludex, An Air-Pollution Index 

Many indices have been proposed and a number are in use by various agencies 
(ref. 10). Polludex is a variation of an index proposed by Pikul (ref. 11). The rationale 
for constructing this modified index is as follows. The standards for TSP and SOg 
specify values for the annual mean which may not be exceeded and also values which may 
not be exceeded more than once per year. In relation to a lognormal plot of the under- 
lying population, these standard values specify the coordinates of two points on a straight 
line. If the data obtained during a 1-year period conform to lognormality and conform to 
the required standards, the plot of the data will closely approximate a straight line fall- 
ing entirely below (or on) the line segment joining the standard points. 

For each of the three pollutants, define 

r = Sample average 
Standard for average 

Estimate of second largest level 

Standard not to be exceeded more than once yearly 

Then Polludex, P (pollutant), is defined for TSP and SOg by 

P(TSP, SO 2 ) = 50 x £max(0, r - 1) + max(0, s - 1)] 


and for NOg by 

P(NC> 2 ) = 100 x jmax(0, r - 1)J 

where max(a, b) means that the larger of the two values, a or b, is to be used. The 
geometric average is to be used in calculating r for TSP and the arithmetic average is 
to be used in calculating r for S0 2 and NOg. For the estimate of the second largest 
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level to be used for s we used the approximate value listed in table I for TSP and in ta- 
ble III for S0 2 . 

With this definition, the same weight is given to the long-term (chronic) effects of 
pollution as is given to the severe short-term (episode) incident. The standards for 
these pollutants have presumably been set with regard to maximum acceptable levels for 
reasons of public health and/or welfare. Thus, we assume that normalization of the 
estimated mean and second highest values by the standards will, in a sense, put each P 
on an equal basis with respect to the potential harm caused by excesses. If the air qual- 
ity is equal to or better than the standards, Polludex = 0. A value of Polludex = 100 can 
be understood to mean that the air is, in a sense, 100 percent polluted, in that a value 
of 100 is obtained when the average and the second highest values are each 100 percent 
higher than their respective permissible levels. Of course, Polludex = 100 would also 
result from a continuum of other combinations, as, for example, when the second high- 
est value is three times its standard, provided the average was at or below its standard. 
Figure 4 graphically illustrates several of these possibilities. Figure 4(a) shows three 
possible examples which have P = 0. Figure 4(b) shows a line having P = 100, where 
both the mean and second largest standards are exceeded. Figure 4(c) shows a line 
where again P = 100, but the standard for the mean has been met. Finally, figure 4(d) 
shows a line with P = 50, where the standard for the mean is not met but the other 
standard is . 



u 

E 

.c 


TO 

CT> 

■8 (a) Standards for air quality 
| are met. 

"to 



(bl Pollution levels twice the 
allowed standards. 



Frequency 


(c) One standard met and the (d! One standard met and the 
other exceeded by a factor other exceeded by a factor 
of t’h ree. of two. 


Figure 4. - Examples of Polludex levels. 
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Four-Year Trends 


Polludex was evaluated for the APCD data and is listed for all three pollutants in 
table VI. The State of Ohio standards were used in these calculations. 

Where there are adequate data, the 1968 and 1971 values are also presented as bar 
graphs overprinted on the Cleveland map. The Polludex values for TSP, NC> 2 , and S0 2 
are shown in figures 5(a), (b), and (c), respectively. If there are two bars, the left bar 
represents 1968 and the right bar 1971. With the exception of site M of figure 5(c), a 
single bar represents 1971. It is clear that, in general, TSP levels have increased to 
the west of the Cuyahoga River and decreased to the east. The most pronounced im- 
provements are downwind of the valley (in Cleveland, the winds are predominantly out of 
the southwest) at sites A, I, and E. The levels of N0 2 show much less variation, ex- 
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(b) Nitrogen dioxide. 
Figure 5. - Continued. 


400 



(cl Sulfur dioxide. Value for station M for 1968 was zero. 


Figure 5. - Concluded. 

cept for the increased levels at sites H and C. With one exception, there has been a 
significant reduction in the levels of SC>2 throughout the city, with the most pronounced 
improvements occurring, as with TSP, at sites A, I, and E. Since space heating is 
fueled primarily by natural gas, this implies a reduction in SOg contamination by indus- 
trial and power -producing sources. At this time we do not have sufficient information to 
determine whether the improvements in the valley are due to the general decline in busi- 
ness activity in recent years, the abatement efforts by the industrial community, both 
of these reasons, or, possibly, neither of these reasons. 
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CONCLUDING REMARKS 


Air-quality data (total suspended particulate, nitrogen dioxide, and sulfur dioxide) 
for Cleveland, Ohio, for the period of 1968 to 1971 have been collated and subjected to 
statistical analysis. It is apparent that the data for total suspended particulate and, to a 
lesser degree, the data for sulfur dioxide and nitrogen dioxide are lognormally distrib- 
uted. The air -quality standards of the State of Ohio are met only sporadically by sulfur 
dioxide in isolated residential neighborhoods. The available data indicate that definite 
improvement in air quality has taken place in the industrial region. Overall, there ap- 
pears to be a net improvement in air quality, which would be a reflection primarily of 
the striking reduction in sulfur dioxide levels. 

A pollution index has been introduced which directly displays information regarding 
the degree to which the environmental air conforms to the mandated standards for the 
environment. As such, it is a useful tool in air-quality monitoring programs. 

! 

Lewis Research Center, 

National Aeronautics and Space Administration, 

Cleveland, Ohio, July 6, 1972, , 

770-18. 
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